Reflections on “Tempest and Typhoon: User-Level Shared Memory”

نویسندگان

  • Steven K. Reinhardt
  • James R. Larus
  • David A. Wood
چکیده

The Beginnings The seeds of the project began to germinate in late 1990 and early 1991 with our effort to rapidly prototype large-scale shared-memory multiprocessors. Because other research groups had a oneto two-year lead in their prototyping efforts—and considerably more resources—our project started with the goal of exploiting the parallel computers that our department was acquiring with funding from NSF’s Institutional Infrastructure program. During this exploratory phase, we made the essential observation that shared-memory systems permit a continuum of implementations, ranging from full hardware support to software simulation/emulation on a messagepassing platform. Moreover, in the middle lies a rich collection of mixed hardware/software design alternatives. An internal research note, dated July 9, 1991, roughly classified these alternatives into five levels: Level 0: Software simulation/emulation. At this level, shared-memory programs execute on an unmodified message-passing parallel platform. A program’s loads and stores are replaced with calls to routines that simulate the shared-memory behavior of the proposed design. Level 1: Shared virtual memory. This level incorporates Kai Li’s observation that address translation hardware can be used to map shared memory references to local pages and detect non-local references, albeit at coarse granularity. Level 2: Fine-grain shared virtual memory. This level makes the observation that shared virtual memory can be implemented at a finer granularity given a mechanism— such as fine-grain “presence” bits—to detect when cache blocks are not stored locally. Level 3: Local hardware support. This level begins to blur the distinction between a test-bed and a prototype. It extends level 2 with hardware support to initiate requests and handle responses on misses to remote data. Level 4: Remote hardware support. The final level adds hardware support to handle external requests to a node’s memory—that is, a directory controller. This last level encompasses all-hardware implementations. Initially, we considered these approaches solely as alternatives for evaluating the hardware of interest, a highly integrated hardware-centric system. This discussion lead to the development of the Wisconsin Wind Tunnel (WWT), the parallel simulation system that gave our project its name [9]. The original version of WWT used a parallel message passing machine (a Thinking Machines CM-5) to simulate a hypothetical shared memory machine. WWT is a hybrid of levels 0 and 2, and uses the CM-5’s ECC bits to implement fine-grain valid bits. Memory references that access non-local shared memory cause a trap, because of either a page fault or an intentionally set ECC error. Finegrain access control allowed direct execution of sharedmemory programs, which resulted in a very fast simulator that permitted rapid evaluation of hypothetical sharedmemory implementations.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Design and Implementation of a new Synchronization Method for High-Speed Cell-based Network Interfaces

A Design for Test Perspective on I/O Management p. 46 Opportunities and Pitfalls in HDL-Based System Design p. 56 Issues on the Architecture and the Design of Distributed Shared Memory Systems p. 60 Design Issues for Distributed Shared-Memory Systems p. 62 The Tempest Approach to Distributed Shared Memory p. 63 Parallel Algorithms for Force Directed Scheduling of Flattened and Hierarchical Sign...

متن کامل

Parallel Computer Research in the Wisconsin Wind Tunnel Project

The paper summarizes the Wisconsin Wind Tunnel Project’s research into parallel computer design and methods. Our principal design contributions—Cooperative Shared Memory and the Tempest Parallel Programming Substrate—seek to balance the programming benefits of a shared address space with facilities for low-level performance optimizations. The project has refined and compared a variety of ideas ...

متن کامل

Adaptive Granularity Transparent Integration of Fine Grain and Coarse Grain Communications

The granularity of sharing is one of the key components that a ect the performance in distributed shared memory DSM systems Providing only one or two xed size granu larities to the user may not result in an e cient use of resources Providing an arbitrarily variable granularity increases hardware and or software overheads Moreover its e cient implementation requires the user to provide some info...

متن کامل

Fine-Grain Distributed Shared Memory on Clusters of Workstations

Shared memory, one of the most popular models for programming parallel platforms, is becoming ubiquitous both in low-end workstations and high-end servers. With the advent of low-latency networking hardware, clusters of workstations strive to offer the same processing power as high-end servers for a fraction of the cost. In such environments, shared memory has been limited to page-based systems...

متن کامل

Tempest: A Substrate for Portable Parallel Programs

This paper describes Tempest, a collection of mechanisms for communication and synchronization in parallel programs. With these mechanisms, authors of compilers, libraries, and application programs can exploit—across a wide range of hardware platforms—the best of shared memory, message passing, and hybrid combinations of the two. Because Tempest provides mechanisms, not policies, programmers ca...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1994